Comparing Distributed Indexing: To MapReduce or Not?

نویسندگان

Richard M. C. McCreadie

Craig Macdonald

Iadh Ounis

چکیده

Information Retrieval (IR) systems require input corpora to be indexed. The advent of terabyte-scale Web corpora has reinvigorated the need for efficient indexing. In this work, we investigate distributed indexing paradigms, in particular within the auspices of the MapReduce programming framework. In particular, we describe two indexing approaches based on the original MapReduce paper, and compare these with a standard distributed IR system, the MapReduce indexing strategy used by the Nutch IR platform, and a more advanced MapReduce indexing implementation that we propose. Experiments using the Hadoop MapReduce implementation and a large standard TREC corpus show our proposed MapReduce indexing implementation to be more efficient than those proposed in the original paper.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Optimization of a Distributed Transcoding System based on Hadoop for Multimedia Streaming Services

In recent times, Hadoop based on the MapReduce model has gained considerable attention because the features of the data preprocessing techniques are not timeconsuming and are suitable for processing large-scale data. In particular, MapReduce is emerging as an important programming model for developing distributed dataprocessing applications such as web indexing, data mining, log file analysis, ...

متن کامل

A Survey on MapReduce Performance and Hadoop Acceleration

MapReduce is implementation for generating large data sets with a parallel, distributed algorithm on a cluster. Hadoop is open source implementation of the MapReduce programming datamodel used for large-scale parallel applications such as web indexing, data mining, and scientific simulation. Hadoop-A framework is able to levitate Hadoop acceleration and give significant performance compared to ...

متن کامل

Research on Multi-Tenant Distributed Indexing for SaaS Application

Multi-tenant is the key feature for SaaS application, however, the traditional indexing mechanism has failed in multi-tenant shared scheme database. This paper proposed a multi-tenant distributed indexing mechanism. We create a global index first and then create the local index by MapReduce framework based on Hadoop. We also proposed the process of index update and index merging. Experimental r...

متن کامل

SciPDFindexer: Distributed Information Retrieval system using MapReduce

Indexing allows the conversion of raw document collections into easily searchable formats. Bigger scale indexing poses some challenges in terms of efficiently distributing indexing computation on a cluster of nodes. MapReduce framework promises to be an effective tool for parallelizing such tasks as inverted index construction. We propose SciPDFindexer, a distributed information retrieval syste...

متن کامل

Of Ivory and Smurfs: Loxodontan MapReduce Experiments for Web Search

This paper describes Ivory, an attempt to build a distributed retrieval system around the open-source Hadoop implementation of MapReduce. We focus on three noteworthy aspects of our work: a retrieval architecture built directly on the Hadoop Distributed File System (HDFS), a scalable MapReduce algorithm for inverted indexing, and webpage classification to enhance retrieval effectiveness.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Comparing Distributed Indexing: To MapReduce or Not?

نویسندگان

چکیده

منابع مشابه

Performance Optimization of a Distributed Transcoding System based on Hadoop for Multimedia Streaming Services

A Survey on MapReduce Performance and Hadoop Acceleration

Research on Multi-Tenant Distributed Indexing for SaaS Application

SciPDFindexer: Distributed Information Retrieval system using MapReduce

Of Ivory and Smurfs: Loxodontan MapReduce Experiments for Web Search

عنوان ژورنال:

اشتراک گذاری